An example file, data.utx, is included in the Samples directory.

TBX-Glossary and convert_glossary were developed for UTX-Simple
1.00. Since then, the UTX-S standard has advanced to version 1.10,
gaining certain features which are not compatible with our software. To
prevent incompatibilities, we describe them here:

First, convert_glossary is not prepared to bypass additional, descriptive
lines in the UTX header. It must find the column definitions in the
second and last header line. However, additional descriptions can be
included as a glossary-wide note, as described below.

Second, UTX-S 1.10 provides for bidirectionality, grouping by concept ID,
and term status. None of these data categories are convertible according
to our current design, which was based on UTX-S 1.00, and all of this
information will be lost in conversion. Entries are converted as though
they were all separate concepts, all monodirectional, and all approved
terms. In order to produce a file with no forbidden terms, with only
approved terms, etc., one must pre-filter the UTX.

In the remainder of this documentation, 'UTX' refers to UTX-Simple 1.00.

This document also describes our quick-input format, illustrated by
another sample, data.txt. This format is identical to UTX, except (a)
that quick input files may omit any UTX element not mentioned below,
even if it is mandatory in proper UTX, and (b) that quick input provides
several conveniences for entering the mandatory part-of-speech data.

For both UTX and quick-input, the converter provides an easement from
the UTX specification: Lines may be terminated with whatever end-of-line
code the local Perl recognizes, not solely with the canonical carriage
return and line feed. On Unix-like systems, where the local end-of-line
code is line feed alone, it will also accept the canonical version. The
converter does not, however, waive UTX's prohibition of files starting
with a byte-order mark.

Source and target languages are expressed in the first header line,
as in all UTX.

Subject field is expressed in the first header line, as an 'optional'
field indicated by the key word 'subject'. It is mandatory for
convertibility.

A glossary-wide note is expressed in the first header line, as an optional
field with the key word 'comment'.

Source and target terms are expressed as in all UTX.

In convertible UTX, source part-of-speech is expressed as in all UTX. In
the quick-input format, it can be implied: a blank in the src:pos column,
or that column's entire absence, indicates that the source term is a noun.

Target part-of-speech is mandatory for convertibility. In convertible
UTX, it may be expressed explicitly in a tgt:pos column, or implicitly:
A blank in the tgt:pos column, or that column's absence, indicates
that the part of speech is the same as in the source language. In the
quick-input format, a third option joins these two: The 'note' field can
contain the tag 'tgt:pos:' followed by a part of speech. This special
note formatting will override the implicit same-as-source assumption
(but will not override an explicit tgt:pos in its proper place). This is
designed to allow the quick-input user to avoid keying a tgt:pos column;
implicit same-as-source covers the most common case, and special note
formatting covers the exceptions. (The tgt:pos portion of the note field
is removed before the note is processed further.)

The convertible part-of-speech values are adjective, adverb, noun,
properNoun, and verb. Sentence is not a convertible part of speech.

The remaining data categories (note on an entry (in source language),
definition, source of definition, contextual example, and source of
contextual example) are convertible but not mandatory. They must appear
in columns headed by the correct abbreviations, as seen in the sample
file. Per the UTX standard, columns after the mandatory three may appear
in any order so long as they are consistent within a file.

When UTX is selected as the output format, the converter will produce
files conforming to the UTX-Simple 1.0 specification and the above
requirements, with this exception: Language tags in the RFC 4646 format
will neither be expanded nor reduced to conform to the narrower xx-XX
format shown in the UTX spec. This may be done by hand if desired.
